-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail weight update when decommission ongoing and fail decommission when attribute not weighed away #4839
Conversation
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
WeightedRouting weightedRouting = weightedRoutingMetadata.getWeightedRouting(); | ||
if (weightedRouting.attributeName().equals(decommissionAttribute.attributeName())) { | ||
Double attributeValueWeight = weightedRouting.weights().get(decommissionAttribute.attributeValue()); | ||
if (attributeValueWeight == null || attributeValueWeight != 0.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this value float or double, should we make specific comparisons. For instance Double.valueOf(0.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is Double. using .equals()
server/src/main/java/org/opensearch/cluster/decommission/DecommissionService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/WeightedRoutingService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
494a7a0
to
05eb55a
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4839 +/- ##
============================================
- Coverage 70.73% 70.70% -0.03%
- Complexity 57884 57894 +10
============================================
Files 4689 4689
Lines 277296 277314 +18
Branches 40363 40367 +4
============================================
- Hits 196137 196075 -62
- Misses 64873 64968 +95
+ Partials 16286 16271 -15
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
||
public void ensureNoOngoingDecommissionAction(ClusterState state) { | ||
DecommissionAttributeMetadata decommissionAttributeMetadata = state.metadata().decommissionAttributeMetadata(); | ||
if (decommissionAttributeMetadata != null && decommissionAttributeMetadata.status().equals(DecommissionStatus.FAILED) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if status is SUCCESSFUL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only when Decommission has failed, we can allow the weight update can go through. Not for any other status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't SUCCESSFUL
a final state? The error message of this if
condition says: a decommission action is ongoing with status...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A succesful decommission state means the transport execution has completed. The nodes are decommissioned in this state (not able to start pre voting or join). Hence DecommissionAction ongoing means a zone is decommissioned and the cluster rejects pings from decommissioned node
Build is failing, please fix the build. |
|
||
public void ensureNoOngoingDecommissionAction(ClusterState state) { | ||
DecommissionAttributeMetadata decommissionAttributeMetadata = state.metadata().decommissionAttributeMetadata(); | ||
if (decommissionAttributeMetadata != null && decommissionAttributeMetadata.status().equals(DecommissionStatus.FAILED) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If decommission status is FAILED can certain nodes be out of service or decommissioned while others not as in partially failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if the decommission is FAILED
, joins or prevoting is not blocked on those nodes and hence if some nodes are out of service (which can happen), the weights request can go through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the zone isn't fully equipped to take traffic right like 90% nodes are out of service and we place weights 10:1:1 when 10 is for the zone only operating at 10% capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment, LGTM otherwise
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
…en attribute not weighed away (opensearch-project#4839) * Add checks for decommission before setting weights Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (#4839) * Add changes for graceful node decommission (#4586) * Add delay timeout for decommission request (#4931) * Integ Tests for Awareness Attribute Decommissioning (#4715) Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: pranikum <109206473+pranikum@users.noreply.github.com>
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (opensearch-project#4839) * Add changes for graceful node decommission (opensearch-project#4586) * Add delay timeout for decommission request (opensearch-project#4931) * Integ Tests for Awareness Attribute Decommissioning (opensearch-project#4715) Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: pranikum <109206473+pranikum@users.noreply.github.com>
…en attribute not weighed away (opensearch-project#4839) * Add checks for decommission before setting weights Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Description
Issues Resolved
#4838
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.